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ABSTRACT 


A separation algorithm applicable to the pattern classification and 
cluster analysis of n-dimensional (n> 2) data is presented. The algorithm 
reduces the dimensionality of the problem by Soe each point into 
a plane. This plane is presented to the user on a computer graphics 
console screen. The operator picks a point on the screen with a lightpen’ 
and chooses a "direction of movement" to achieve or increase separation, 
thereby causing an iteration of the algorithm. Each iteration is in fact 
a reorientation of the plane into which the data points are projected. 
Iterations continue until satisfactory separation is achieved. The 
algorithm is not restricted by the dimensionality of the data, nor are any 
distributional assumptions required. Results from six case studies 
indicate that the algorithm is a useful tool for the analysis of multi- 
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dimensional data. 
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I. INTRODUCTION 


The separation algorithm presented in this paper is a development 
of an idea of Professor Thomas M. Cover of Stanford University. It pro- 
vides a tool for classifying multidimensional data sets. Two classification 
methods can be handled by the program which implements the algorithm; 
these two methods are pattern classification and cluster analysis. 
Meisel [4] discusses and contrasts these two processes. He describes 
pattern classification as the "process of developing on the basis ofa 
finite set of labelled samples, a decision rule with which we can classify 
a point in the pattern space corresponding to an unlabelled sample." 
Cluster analysis on the other hand is described as a vrocess for "describing 
and locating the discernible subsets of which a set of sample points of 
reasonable complexity is usually formed." 

Meisel notes that the difference between pattern classification and 
sista: analysis is that the samples for the former are labelled, whereas 
in the latter they are unlabelled. Thus it appears that the question of 
whether a particular problem should be approached as a pattern classifi- 
cation or cluster analysis problem depends on the use to be made of the 
results and on the nature of the data available. In the pattern classifi- 
os problem the initial data set is divided into two (or more) subsets 
which are identified. Each member of the data -_ is labelled as being 


included in exactly one of these subsets. The decision rule developed 





typically takes the form of a function which has an argument in the form 
of a data point. When input with the values of elements of different 
subsets, the function assumes values in disjoint ranges. For example if 
the elements of a two subset problem are identified as X and Y respectively, 
and the discriminant function is f, possibly f(X) would be negative for all X; 
and f(Y) would be positive for all Y. The discriminant function is then 
used to classify unlabelled points Z as being if either the X or the Y sub- 
sel 

In a Cluster analysis problem, the initial data set has no previously 
defined subdivisions. The purpose of the analysis is to decompose the 
given set into a group of subsets. The number of subsets to be found 
can be specified in advance or unknown. Utilizing similarities among the 
elements in the original data set, assignment to the subsets is made. 

Both pattern classification and cluster analysis techniques can be 
described as being direct or indirect. In an indirect process, some 
criterion function is used in the construction of the discriminant function 
or is used to define the quality of clustering desired. A direct process 
has no.such criterion RImSbione the formulation of the discriminant function 
(or clusters) is accomplished by making immediate use of the data points. 
The algorithm developed in ae paper is direct in a 

It may be that fora particular problem, both pattern classification 
and cluster analysis techniques are appropriate. For example ina pattern 
Classification problem with two subsets forming the data set, the subsets 


might be inspected separately using cluster analysis. Alternatively 





after successful usage of clustering techniques, a pattern classification 
algorithm might be employed to define discriminant AunIeRoNS for the 
clusters. 

The separation algorithm presented in this thesis is implemented 
by a computer program which utilizes the graphics edit mode of operation 
provided by the XDS9300 digital computer combined with the Adage AGT-10 
graphics console. The man-machine interface is emphasized. The basic 
concept of the algorithm becomes apparent when a comparison is made 
with the concept of Fisher's Linear Discriminant. Duda and Hart [3] 
present Fisher's Linear Discriminant as a method for reducing the dimen- 
sionality of a pattern recognition problem. By projecting the points of 
the n-dimensional pattern space onto a straight line in the pattern space, 
the dimensionality of the problem is reduced from nto one. They further 
point out that the projections of the points may fall on the line in a very 
confused manner. say changingetive positon Of the line in the n=space, 
an orientation is sought in which the projections of the data points will 
display separation. The achievement of this orientation, if possible, is 
the goal of the etna analysis. 

By way Of comparision, the algorithm presented here also represents 
a dimension reduction approach; however, the projection-of the data 
points is made onto a plane vice a line. The dimension reduction is then 
from n to two. If nis less than or equal to two, this algorithm is not 
appropriate; eRe Sicightiommard methods employing scatter plots, etc., 


can be employed. As in the Fisher method, the algorithm proceeds as a 
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series of reorientations of this plane in the n-space. Following each new 
orientation the program presents the projections on the screen of the 
graphics console. Ina pattern recognition problem with two subsets, 

each point's projection is represented by a video "Y" or "X". Ina cluster 
analysis each projection is presented asa "Y". With the XDS9300/AGT-10 
combination in the graphics edit mode, the user decides whether or not 

to make an iteration, and thus reorient the plane. The decisions if and 
how to iterate are based on the situation presented on the console screen. 
The importance of the man-machine interface should be clear. The iterative 
process will be described in greater detail later. Note that the decision 
of how to iterate and when to terminate rests with the user. He muliiine 


when adequate separation has been attained. 





II. A PREVIOUS COMPUTER-GRAPHICS APPLICATION 


This discussion of a report by Herman Chernoff entitled "THE USE 

OF FACES TO REPRESENT POINTS IN n-DIMENSIONAL SPACE GRAPHICALLY" 

1 is presented as an example of the type of man-machine graphics 
approach to multidimensional data analysis previously proposed. Chernoff 
uses each n-dimensional vector corresponding to a data point to mathe- 
matically determine the shape and features of a human face which is output 
by a CALCOMP plotter. Chernoff's hypothesis is that points belonging to 
the same class or cluster will be identified by the user due to the similarity 
of the faces they generate. The user constructs the clusters by grouping 
the faces into sets which show common features. As an illustrative 
example, Chernoff applies his method to a data set of eighty eight- 
dimensional measurements of nummulitid specimens from the Eocene 
Yellow Limestone Formation of Northwestern Jamaica. (This data set will 
also be used later as:an illustrative example for this paper.) Chernoff's 
results indicated the aeenSiice of three clusters in the data. These results 
correspond exactly to the results obtained by Wright and Switzer [6] who 
employed a different cluster analysis scheme to the same data set. 

In constructing faces Chernoff can handle up to eighteen dimen- 

sional data to determine such features as the shape and spacing of the 
eyes, size and curvature of the mouth, and the shape of the head. 


Clustering obtained by Chernoff's method may depend on which components 





of the data vectors control the various facial features; it is not clear 
how this assignment should be made before the problem is Started. 
Chernoff dismisses any criticism of the eighteen dimension limitation, 
saying that any further increase in dimensionality could be overcome by 
adding features to the faces. This is an interesting point; it would seem 
that if too many features were added to the faces, the decision process 
would be more complicated. Chernoff notes that "when the eyes are very 
small, the pupils become hard to detect." Also he notes that some infor- 
mation is lost when the two ellipses which form the head are not 
differentiable and the face is circular. Certainly, within any moderate 
range of dimensional values, and as noted by Chernoff, this method 
"provides a promising approach for a first look at multivariate data which 
is effective in revealing rather complex relations not always visible from 


simple correlations based on two-dimensional linear theories." 





III. DISCUSSION OF THE ALGORITHM 


As indicated in Section I the separation algorithm developed in this 
paper is applicable to both the pattern classification and the cluster 
analysis problems. The concept of operation of the algorithm in each of 
these applications is the subject of the following paragraphs. 

For the pattern classification problem the analysis is conducted 
On an n-dimensional data set which is divided into two subsets. These 
subsets might represent two classes of objects, Py and P, . Each object 
would be characterized by an n-vector (or pattern) of measurements. Each 
element of the pattern is referred to in the incense as a feature. The 
goal of the pattern classification analysis is to determine a function 
(called the discriminant function) f:R > cae where R” is real n-space, 
which can be used to asSign an unlabelled pattern Z to either Py or P, : 

A scheme for assigning Z might be: 
{(Z) <O> Z« Py 
(2) >0> ZEP, 
If a discriminant function of reasonably low complexity can be determined 
for the two classes, the classes are separable; if no such function is 
found, the classes are inseparable. Since the discriminant function is 


constructed using samples from each of the classes , it must be determined 


that the separation between the classes is sufficient to warrant the use 
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of the discriminant function in making class assignments. For example, 


4 


for two classes to be linearly separable, the convex hulls of the two sets 
of projections should be disjoint. 

For purposes of illustration weeeateeuSaion . suppose the initial 
data set has N members in Py and M members in P, . Lhe Py subset is 


read by the computer as Y vectors; thus the program reads N vectors Yee 


i=1,N. Similarly the P, subset is read as a set of X vectors, Xe 


Z 


j=1,M. The dimension n of these vectors should be greater than two. 

(If n is two or less the problem can be handled Beeiically .) The algorithm 
proceeds to the dimension reduction task as follows. The user chooses two 
non-zero vectors A and B in Euclidean n-space. A linear transformation 

is established as T, pi = (A°Z,B°Z) where A-Z = : az. is the inner 
product and Z€ R. This transformation which is a projection of the point 

Z into the plane defined by the vectors A and B is the algorithm's vehicle 
for dimension reduction. The projection is not generally orthogonal. The 


~ 


transformation is applied to each member of Py and P,; the two sets of 


9! 
coordinate pairs are then scaled. In order to be presented on the screen 
of an AGT console, a neat pair (x,y) must have values -1£ x,y£ 1; 
it is to obtain TENE within this range that the scaling is done. The 
scaling is accomplished by searching through both sets of projection 
pairs to find the maximum absolute value of the first coordinates (called 
TMAX) and the maximum absolute value of the second coordinates (called 


SMAX). Then the typical (A-X,B*X) becomes (A-X/TMAX,B° X/SMAX), 


and the typical (A-Y,B- Y) becomes (A- Y/TMAX,B- Y/SMAX). Thus the 


1] 





range of values for all pairs of the problem is as required by the AGT 
console. The resulting A-B plane is graphically ear a to the user on 
the AGT screen. It is the user's decision based on information gained 

from the screen to decide how to proceed. In all but the most extraordinary 
case the initial picture will show some intermingling of the X's and Y's. 
The heuristic argument which follows provides a reasonable guide for 
proceeding. 

The user is seeking a presentation which depicts the X's and Y's 
in disjoint concentrations. Frequently the initial view of the A-B plane 
will yield an indication of where these concentrations will develop. If 
this is not the case, and provided there is a solution, such an indication 
should appear after a sequence of "random" iterations. The user then 
picks the X or Y which is deepest in the opposing concentration and 
"moves" it toward its proper concentration. The user implements these 
decisions with the box of sixteen function switches and the lightpen 
attached to the console. The positions of the function switches and the 
lightpen relative to the graphic console are indicated in Figure III-1l. 
The user makes his Eittlice by depressing the function switch marked 
ePICK X" ior "PICK Y" and placing the tip of the lightpen over the XoryY. 
Figure III-2 shows a user at the console ready to pick a point with ae 
lightpen. The point seen by the lightpen partially vanishes; that is, a 
Y might become a V and an x aly einen eantnes - desired point 
disappears, the user completes the choice by a oe the switches 


marked "COMPLETE CHOICE" and "END EDIT". Ready to move, the 
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AGT CONSOLE 


Figure III-1. 
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OPERATOR SITTING AT AGT CONSOLE 


Figure I[II-2 
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user depresses one of four switches (LEFT, RIGHT, UP, or DOWN) to 
indicate the desired direction of motion. Figures III-3 and III-4 give an 
indication of what would occur on two iterations. In Figure IJI-3 the user 
picks the Y indicated and iterates with a move to the right. Then in 

Figure III-4 he picks an X and iterates with a move to the left. The 
iteration step consists of the modification of the A and B vectors, the re- 
computation and rescaling of the projections, and finally the representation 
of these projections on the AGT screen. The modified values of the A 

and B vectors depend on the point picked and the direction of motion chosen. 
If the point picked is the projection of data point Z, the new A and B 


vectors will be as follows: 


Direction of Move NEW A NEW B 
LEFT A-Z B 
RIGHT A+Z B 
UP A B+Z, 
DOWN A B-Z 


The new A and B vectors are used in the recomputation of the T transfor- 
mations. 

Having woe the desired separation, the user causes the line 
printer (XDS9300 output device) to print a six by six inch reproduction 
of the projections in the A-B plane. Using this printout he may fit a 
Suitable discriminant function between the two concentrations. In 
Figure III-5 se output with the discriminant function is represented. The 


discriminant function f(Z) in this case is a straight line, and its equation 
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Figure III-4. Iteration by picking an X and moving it left 








Curve of Discriminant 
Function: f=0 





Figure III-5. Output with discriminant function 
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would be: 

£(Z) = % (A-Z/TMAX) + B (B*Z/SMAX) + 8 
where AK , B , and 8 are scalars. The vectors A and B and the scaling 
factors TMAX and SMAX have values corresponding to the printout. With 
the specification of the discriminant function, the pattern classification 
is completed. 

The employment of the separation algorithm for cluster analysis is 
very similar to that described for pattern classification. Fora cluster 
analysis problem the initial set of data points is not broken down into 
subclasses; the program reads all patterns as Y eae The iterative 
operation of the cluster separation is the same as the pattern classifi- 
cation except that there are no X's present. When the user considers 
that he has displayed a satisfactory set of clusters, he can have the line 
printer output the six by six inch siesianusy The output also provides a 
listing by point of the projection coordinates. Thus the cluster member- 
ship of each point can be determined. As in the pattern classification 
problem, no criterion ‘function is provided to direct the procedure or to 
signal termination. It =e with the user to make value judgments to 
conduct the analysis. There is no guarantee of a unique clustering 
scheme for a given data set; comparisons between different schemes 
should provide a great deal of information. 

As a final comment on the verbal description of the two types of 
operations just presented, it should be noted that in reality the program 


does not "move" any points. Instead, it is reorienting the plane into 
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which the points are projected by modifying the A and B vectors. Separ- 
ation is achieved when the A-B plane is properly oriented. 
Mathematically the progress of the algorithm can be traced through 
a series of steps as shown below. Suppose the initial data set consists 
of k-dimensional vectors Yoo i=l1,N and x, j=1,M. In the cluster analysis 
problem M=0. With the non-zero k-vectors A and B specified, the 
algorithm proceeds: 
1) Center the data sets by calculating a k-vector 
3 M 
C=(1/M+N)(. fat 2. xX), and use C to center the X's 
>! jr 
and the Y's as follows: 
X'=X -C, j=l,M 
J J 
Yi=Y'-C, i=l,N 
aes 
2) Calculate the projection Ty B for all X and Y: 


c 


T. —(X') = (A-X!,B°X') j=1,M 
A.B ;) ( ql 


T, ph) = (ArY: ,B ao) i=l,N 
3) Calculate the scaling parameters TMAX and SMAX 
TMAX = max(max | A “Xi ,»max [A “Y. | ) 
fT IN 
j=1,M 


SMAX = max(max Ja-x:| , max [B-y! ) 
j i 


4 is... 
) Scale the T. RB projections | | 
T X.) = (A*X'!/TMAX, B-°X!/SMAX) j=1,M 
a pik) = (ArX/ /SMAX) 3 
Ta pl)? = (A-Y./TMAX, B * ¥;/SMAX) i=l,N 
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5) At this point M video X's and N Y's appear on the AGT screen, 
if a Y is picked anda right move is selected, a new A=AtY 
is calculated. Thus at this point, either a new A vector or 
a new B vector is found and control shifts to step 2. 


The five steps listed above describe the activity in the program at each 


iteration. 


iS) 





IV. MOTIVATION OF THE ALGORITHM 


The attraction of the approach of this algorithm for separating data 
into its classes has a dual basis. Study of the iterative operation of the 
algorithm reveals as the first part of this basis the manner in which the 
correlation within data classes is exploited in seeking separation. The 
second grounds for confidence is the similarity of the algorithm's method 
to that proposed in the Perceptron Convergence Theorem. This similarity 
will be utilized to present a convergence proof for the algorithm. Each 
of these supporting concepts will be discussed in the following paragraphs. 

If the data set to be analyzed via the algorithm is separable, the 
set can be considered to be made up of a number of subsets. For illus- 
trative purposes, Suppose the data set consists of two subsets. The 
assumption is made that there is a higher correlative measure within 
these subsets than between them; it is this difference which has value 
in the attainment of separation. If the two subsets are labelled X and Y, 
then without considering the scaling operation, it is clear from the dis- 
cussion of the last section that the algorithm maps these sets onto a 
plane as 

(A*X.,B*X.) for x € XK 
J J J 
(A-Y.,B°Y.) for Y€ Ne 
Now suppose an iteration is made by picking an X and moving to the right. 


It has been seen that the new A is in fact AtX, . The new mappings are 
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Mepe lee Bex ) = (A°X +X - X_,B°X.), forX,& X 
(( x. J ;) ( J x. J J J 
e ° = * + rs e 
((A+X, ) Y,,B a (A o x Y,B Y), for a € Y 
The difference in the mappings (i.e., the separation gained) is seen to 
be in the first coordinates as the difference between X 


*“X andxX,° Y.. 
j i 


k k 


Standard correlation concepts indicate that if the X set is highly corre- 
lated, while the correlation between the X and Y sets is low, then the 
product xX _ will be greater than XV Y . Clearly if some Yon is picked 
the difference in the mappings caused by the iteration will be the difference 
between ay Y, and Yn ae . In general the degree of separation gained 
Ona single iteration will tend to be small; however, it is the goal of the 
algorithm to combine the incremental gains in. separation of a sequence 
of iterations to obtain the desired distinction between classes. 

To perceive the similarity of the algorithm to the method of the 
Perceptron Convergence Theorem, one has only to consider the following 
version of this work as presented by Minsky and Papert [5]: 

"The Perceptron Convergence Theorem" 
Consider the following program which the vector 
notation A-¢is > ay b&) (in place of our usual notation) 
START Choose any value forA 
TEST Choose an X from FUF 
IfX€ F' and A‘#20 go to TEST 
IfX€ F andA-&£0 go to ADD 
IfX€ F and A-®*0 goto TEST 
IfX€ F and A‘’é20 go to SUBTRACT 


Za 





ADD Replace A by A + & (X) 
Go to TEST 
SUBTRACT Replace A by A - ¢ (X) 
Go to TEST 

We assume until further notice that there exists a vector A* with 

the property that if X € Fp then A*- & (X) > O and if X € F then 

A*+@ (X)< 0. The perceptron convergence theorem then Lee 

that whatever choice function is used in TEST, the vector will 

be changed only a finite number of times." 
The variable A of the above theorem corresponds to the A of the present 
algorithm; there is no B vector in the Perceptron theorem approach. The 
Perceptron method can be considered to be a more direct method than 
that presented for the algorithm here since it specifies that only mis- 
classified points will be Be tea on. Naturally this approach is 
allowable but not required in the present work. If it is assumed that, 
in employing the algorithm, the user conforms to the wage operandi of 
the Perceptron Convergence Theorem, it is possible to prove that if the 
data is linearly ei the algorithm will attain separation within a 
finite number x, steps. The following proof is a modification of the 
method employed by Duda and Hart BJ]; it shows that, provided there 
exists an orientation of the A-B plane in which the X subset is linearly 
separable from the Y subset, this orientation can be attained with a 
finite number of iterations. Suppose the situation in the A-B plane is 


as indicated in Figure IV-1. Let b=B/SMAX and a=A/TMAX. Then 


De 








Figure IV-1. Orientation of the A-B plane in which the 
X and Y sets are linearly separated 
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b*Z - m(a°* Z) + a £(Z) 
where m is the slope of f, e is the y intercept, andZis anX or Y. Then 

{(Z) = (b-ma)*Z + we 
To simplify this expression the following n+l dimensional (where n is the 
dimension of A,B, and Z) vectors are defined: 

= ey and > = () 
Then clearly we can write 

f'(S)=T°s 
Then f'(S) is a hyperplane which passes through the origin of the n+l 
dimensional "S" space. As Duda and Hart point out, the addition of the 
han dimension as a one to the n-dimensional Z preserves all distance 
relationships among the points of the original data set. The S vectors 
are all in the same n-dimensional subspace of the n+l space. This 
translation into n+l space reduces the problem from examining the vectors 


a and b and the scalars ee and m to the problem of inspecting the vector 


(F)<0 for all X, and f' (7)20 for 


T. From Figure IV-1 it is apparent that f' 
am Y«. 
Now defining a variable U such that: 


U = ic for all X 


(J) = () for all Y 
the problem is confined to discussing f'(U) =T-U>0 forall U. The 
behavior of the user is such that if he picks a Y such that f' () <0, he 


Causes the program to iterate with T = T+Y. Similarly if he picks X such 
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that f' () 20, he iterates with T = T-X. In either case the iteration can 


be considered as picking a U and iterating with T= T+U. The iterations 


are conducted only when the function as formed misclassifies the U picked. 


Let qT be the value of the T vector on the et iteration; UL is the 


lout . ae a 
point picked forthe k iteration; and e is a positive scalar. Also T is 


the "solution" vector T. Then 


(T, ,,-eT) = (T,-eT)+U, (a) 

apy -eP I? = Ute 1? +2(r,-ef)- u, + Iu?) 
However T° US O. Thus 

ll, net IS < Mt -ef IH -2e%-0, + [lu I (c) 


A 
Since TUL is positive, the second term on the right of equation (c) will 


dominate the third if e is sufficiently large. In particular let mmax U, 
i 


and s=min TU, fOrmi—l 62. 7m, +m, (assume there are m,X's, and m,¥'s). 
i 

Then 

[|r -eT ||? = [7 -~eT ||“ Sessile r (d) 

k+l | is 
Now if Bsr 
A ? A ? ? 
It,.5-eTll _ It, -eT |“ -r (e) 


ee aA 
Thus the squared distance from T, to eT is reduced by at least a at 


k 
each iteration. After K iterations 


2 


-et |[° < ||7,-eT I] * -Kr (f) 


ieee 
Since the squared distance can not become negative, it follows that 


the sequence must terminate after no more than Ko iterations, where 


= [[7,-e?|| 2 he? 


Zs 








Thus the number Ko provides an upper bound on the number of 
iterations required to transform the initial vector Ty to the solution 


ras 
wector I. 
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V. DISCUSSION OF THE COMPUTER-GRAPHICS PROGRAM 


The construction of the program formulated to implement the separa- 
tion algorithm will be discussed in this section, primarily to demonstrate 
what operations are performed on the data during the execution of the 
program. Instructions for the use of the program are provided in Appendix 
el. 

The program is written in FORTRAN IV as modified for the XDS9300 
computer installation at the Naval Postgraduate School. It employs the 
library graphics program "MAD" to allow use of the AGT graphics console 
with function switches and lightpen to alter the graphics display and to 
provide printed output via the line printer of the display. The main 
program can be considered to consist of four parts: 

1) Setalip 

2) Projection Calculation And Display 

3) Display Modification | 

A) Output 

In the set up portion storage is established in the 9300 to handle 
arrays based on the dimension and size of the set or sets, if there are 
two, of data. The data is read into the computer via two formatted READ 
statements. The first data set is labelled Y and the second is X (for the 
pattern classification problem). Ifa cluster analysis problem is to be 


conducted the variable M (number of members in the second set) is set 
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to zero. The program then reads the data in as one group of Y's. The 
only modification made directly to the data follows. The data as a whole 
is centered. For example in three dimensions, three averages DAV(]1), 
DAV(2), and DAV(3) are computed over the elements cf the members of 
the population of the X's and Y's. Then each Y=(¥5 +5 Yq) is modified 


-DAV(2),y ifs 


to (y,-DAV(1) ,y -DAV(3)) and similarly X= (x) 1 X_ 1X) 


3 
-DAV(2),x 


Z 


translated to (x, -DAV(1) ee -DAV(3)). 


2 3 


The final inputs of the set up section are the initial values of the 
A and B vectors. The program then proceeds into the calculation of the 
projection coordinate pairs of each Y and X; the second part of the program 
now Starts. 

Through a succession of DO-loops the projections of the Y's and 
X's are calculated. The AGT screen is set up to display two a 
points (G,H), where -]1 = G,H£=1. This necessitates that the projections 
be appropriately scaled at each iteration. The scaling is accomplished 
by searching through all pairs for the maximum absolute Pues of each 
of the two coordinates. The first is called TMAX, the second, SMAX. 
Then if a point Y maps _ a projected point (YY1,YY2), the point dis- 
played on the AGT screen will have coordinates (YY1/TMAX , Y¥2/S MAX) ; 
The program then proceeds to display the set of the projections generated 
from the Y's followed by the projections generated from the X's. With 
the presentation of these points, the user using the function ewienee 
and the lightpen can pick an X or Y and cause an iteration by picking the 


direction of shift (LEFT, RIGHT, UP, or DOWN); this occurs in the third 


portion of the program. 
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If the operator chooses to make another iteration by picking an X 
or Yon the screen and specifying a direction, the en identifies the 
data vector corresponding to the chosen point and using the ———— 
version of this data point, calculates the new Aor B vector. A loop is 
then made back to the beginning of the second section. There are options 
in addition to making an iteration open to the user. He can cause the 
program to loop back to the first section for inputting new values for the 
A and/or B vectors. He does this by depressing the switch "NEW A&B". 
By pushing the function switch "REGRESS”, he can return to the display 
previous to the present. Finally by pushing “OUTPUT” he can cause the 
program to proceed to section four. 

In section four the program fills a sixty-one by thirty-seven 
dimensional array with symbols corresponding to the present AGT view. 
This array prints out as a six by six inch square. The square is outlined 
by asterisks except for four zeros indicating the intercepts of the coordinate 
axes with the edge of the square. The points within the square appear as 
an X, 0, J, 8, or E. The X and 0 correspond respectively to the position 
of an X or Y projection fine array cell. An 8 corresponds to two or more 
Y's ina cell; ma E implies two or more X's. A J indicates that a combina- 
tion of X's and Y's occupies the cell in question. When the printout is 
complete, control is shifted to section three to allow the user to decide 
what to do next : 

It is Pear that the only modification to the data is the centralization 


to deviations about the means performed in section one. The projection 
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of an n-dimensional point, for example, Y, has coordinates 
(A {Y-Y)/TMAX , B ‘ (Y-Y)/S MAX) . As mentioned earlier, this section of 
the thesis is only intended to describe the operation of the program and 
to depict the relation between an initial data point and its eventual 
projection. 

The compilation time for this program is approximately eight minutes. 
Of course, the total running time for the program varies with the dimen- 
sionality of the problem, the size of the data set(s), and with the number 
of iterations required by the user. The running time per iteration ina 
six dimensional problem with two hundred and eighty data points was 
found to be approximately twenty seconds. A user can expect to spend 
between thirty and forty-five minutes making arene on a problem of 
this magnitude. If there is no reasonably clear-cut separation between 
the sets of a pattern classification problem or between the clusters of a 


cluster analysis, the running time may be considerably longer. 
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VI. EXAMPLES AND APPLICATIONS 


As a means of verifying the proper operation of the separation 
algorithm in the graphics program, four data sets were generated for 


pattern classification analysis as follows: 


1) Two three-dimensional cubes 

2) Two six-dimensional cubes 

3) Four six-dimensional cubes 

4) Four dimensional sphere within a four-dimensional shell 


Each of these data sets possessed geometric qualities which could 
easily be perceived, thus allowing concrete evaluation of the operation 
of the program on multidimensional data. Before addressing each data 
set individually, a caer comments are appropriate. Per set con- 
sists of two differentiable subsets labelled X and Y. The format of the 
Outputs is as described earlier, with points being outputas X, O, 8, E, 
Or J. The vector C in the discriminant functions is the centering vector 
defined in Section III. Coordinate axes are drawn in. The points picked 
for iteration are circled, and the direction for the iteration is indicated 
by an arrow extending from the circle. 

The tuples of numbers which make up each subset were generated 
on the IBM 360 as uniformly distributed random numbers over appropriate 


ranges. In each case the number of elements per subset depended on 


the dimensionality of the problem; for n dimension each subset should 
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have at least 2" members. This lower bound is set to insure some 
reasonable degree of definition for the subsets in the n Bree. The 
particulars of each data set and the results of the sample runs will now 
be discussed sequentially. 

CASE 1. The first problem to which the algorithm was applied in- 
volved the separation of two three dimensional unit cubes. Both the X 
and Y subsets consisted of thirty-three tuples. If X=(K) 1X5 1Xq) then x. 
for i=1,3 was generated as a random number, uniform (0,1); if 
Y=(y, Yo 1Y3) the Y. for i=1,3 was generated uniform (1,2). Thus the X 
‘subset can be thought of as being drawn from the unit cube with one 
vertex at the origin and lying in the three-space. The Y subset was drawn 
from the unit cube displaced diagonally from the origin by the X cube. 

The two cubes have the vertex (1,1,1) as a point in common; however in 
accordance with the requirement that the sets be disjoint, (1,1,1) wae 
- considered to be a possible Y value. 

Figures VI-1-1 through VI-1-3 show a three step sequence in the 
application of the algorithm. In Figures VI-1-1 and VI-1-2 the Y 
(indicated as 0's) and X oe are mixed; however satisfactory separation 


is obtained in Figure VI-1-3. The values of the A and B vectors are as 


follows: 
Figure VI-1-1 <A(1)=-.5507 B(1)=-1.1837 
A(2)= .5069 B(2)= -.7223 
A(3)=-.1261 B(3)= 1.4448 
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Figure VI-1-1. CASE 1, FIRST STEP 
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Figure VI-1-3. CASE 1, THIRD STEP 
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Figure VI-1-4. PHOTO OF CASE 1, FIRST STEP 
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PigurervI-w=5. PHOTO OF CASE 1, SECOND STEP 
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rigmne VI-l—-6, PHOTO OF CAsE 1, THIRD SlEP 
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crane vi— 1-2 A(1)=- .3283 B(1)=-1.1837 


A(2)= .5668 Bigjy= =.7220 
A(3)=- .0842 B(3)= 1.4448 
Figure VI-1-3 A({1)= .5713 B(1)=-1.1837 
A(2)= .5229 B(2)= -.7229 
A(3)= .6769 B(3)= 1.4448 


From Figure VI-1-3 a candidate discriminant function can be clearly 
seen as 

f(Z) = A- (Z-C) /TMAX 
where A has the component values listed for Figure VI-1-3 above, and 
TMAX=1].5198. Thus if f(Z) > 0, Z is classified as a Y: if f{(Z) < 0, Zis 
classified as an X. 

Figures VI-1]-4 through VI-1-6 are provided to show the degree of 
correlation between the view on the AGT screen and the resulting printed 
output. Figures VI-1-4, VI-1-5, and VI-1-6 correspond respectively to 
the outputs designated Figures VI-1-1, VI-1-2, and VI-1-3; the high 
degree of similarity penea the related figures is clear. The results 
of this comparison can be readily extrapolated to the remaining problems 
of this section and to applications of the algorithm in PAM therefore, 
photographic evidence for the remaining cases need not be included. 

CASE 2. For the second application the dimensionality of the problem 
increased from three to six. The Y and X subsets were composed of 
seventy-six tuples generated from unit cubes oriented in six-space in 


the same fashion as the unit cubes of case one were in three-space. 


39 





Thus a sample from the Y subset would be Y=(y,-Y, V2 YarVor¥e) where 


By S2 for i=1,6; and for X the correspondence would be X=(x, Xo 1 Xq i Xyy 


oe ), Osx $1 for i=1,6. Figures VI-2-1 through VI-2-3 depict the 


ax 
s 6 
separation algorithm applied to this problem. Note that except for the 

greater number of data points involved, the complexity of the problem is 
not increased from the three dimensional problem of case one. TheA 


and B vectors for these three views are as follows 


Figure VI-2-1 A(1)= 1.3448 B(1)= .0868 
NG ete B(2)=-1.4109 
A(3)=-1.9594 | B(3)= -.7244 
A(4)= .7525 B(4)=-1.0185 
5)= 1731047 B(5)= 2.9777 
A(6)= -.7075 Bio) 4225 
Figure VI-2-2 A(i)= 1.3448 B(1)= .6450 
A(2)=  .3701 B(2)= -.5899 
A(3)=-1.9594 B(3)= .0692 
4) —we 7325 B(4)= -.3409 
A(5)= 1.3047 B(5)= 3.0208 
A(6)= -.7075 B(6)= 57001 
Figure VI-2-3 A(1)= 1.3448 B(1)= .7298 
A(2)=  .3701 B(2)= .4801 
A(3)=-1.9594 B(3)= .5716 
Ae) 7525 B(4)= .5345 
5 )— lea B(5)= 3.0897 
A(6)= -.7075 B(6)=  .7335 





LP 
Siedihs 





4 
ty, 
! 


1 Vee 
sri’ 


MRR HKHHHM ME KKH HAR HEHE H ERK MRE HK EL GHEE HARK HEHEHE HH HEN HRHEKR HEHEHE 


x * 
% O % 
* * 
# x 0 * 
x xX 0 O * 
x 6 0 OQ * 
ig 0 0 * 
* XX X * 
% @) * 
* O DP xX QO O * 
# X xX xX ee) * 
* xX 0 6) _* 
x Ox x ORO OX O O me 
¥ O 0 X m0 a) 
% x 000 6) * 
* ‘2 X & * 
# x0 x one X/OXC 6) * 
* x X J XX 0 00 * 
* 0 00 ee) + 
* 6 EOX C1) 0 0 * 
* X Oo> XX oK x QO * 
* X X O * 
* xX X X 0 * 
* moO x XX X O 0 + 
* Xx he OX X xX Q + 
* X xXOXxX X @) Oo @ 
* y ae? * 
* x 0 0 * 
» 4 O % 
% % 
® @ a ¢ 
* * 
* x x 
. - Q + 
x t 


HEHEHE KEK KKH EK EKER EK HER KHER KO HER RHR HEHE ERE HEE HHKRKE EERE H HERS 


Figure VI-2-1. CASE 2, FIRST STEP 
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Figure VI-2-2. CASE 2, SECOND STEP 
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Figure VI-2-3. CASE 2, THIRD STEP 
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From Figure VI-2-3 a reasonable discriminant function would be 

£(Z)=B ¢ (Z-C)/SMAX 
where B has component values listed for Figure VI-2-3 above and 
SMAX=3.1940. Thus a vector Z for which f(Z) 7 0 would be assigned as 
a Y: if £{(Z) < 0 the result would be classification of the Zas an X. 

CASE 3. In the last problem, separation was obtained between 
two sets of six-dimensional data. The aim of the problem in case three 
is to complicate the situation by generating two six-dimensional subsets, 
labelled Y and X, each of which can further be described as consisting 
of two subsets. The purpose of the application is to see if the algorithm 
would cause a separation to appear not only between the Y and X sets, 
but also within these sets. Ina sense this problem is a combination of 
a pattern classification application and a cluster analysis problem. The 
Y and X sets each consist of one hundred and forty-six tuples; these sets 
were additionally generated as two subsets of seventy members each. 
Thus the first seventy points in the Y subset were generated as Y=(¥4+¥5 j 
¥3:Vas¥orYo) where 0 £ y. Storie amen) S y. © 2 for i=5,6. In 
the sécond subset of Y's, 0 = y. =a for,i-1.6. The first seventy X's were 
Pe 3a Aaas! 6 
1$ Xe < 2; for the second X subset, 0 ox, Si teori— lat cs S. Zz. 


typically X=(x, Mee ) where 0 ¥ x, <1 for i=1,5 and 


and 0 > Xe <1. As in case two the increased complexity of the situation 
is reflected immediately by the number of points involved; the computa- 


tional aspects are unaffected. Once again only three figures are required 


to display separation. The A and B vectors assume values as indicated 
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Figure VI-3-1 A(1)= .0503 B(1)= 
A(2)= -.3371 B(2)= -.5137 
A)= .5022 B(3)= 1.4294 
ie) — 91.1334 B(4)= -.5413 
AS )—semo3 75 B(S5)= .6996 
A(6)= 1.5314 B(6)=-1.2124 

Figure VI-3-2 A(1)= 1.6166 Bil)—" a 74 
A(2)=-1. 7057 B(2)= -.5137 
A(3)= .6496 B(3)= 1.4294 
A)= 1.2731 B(4)= -.5413 
A(5)= .9450 B(S5)= .6996 
A(6)= 1.9902 B(6)=-1.212] 

Figure VI-3-3 A(1)= 1.6166 B(1)=-1.5022 
2) == eS B(2)=-2.1664 
A(3)= .6496 B(3)= 1.4117 
A(4)= 1.2731 B(4)= -.3975 
A(5)= .9450 B(S)= .6192 
A(6)= 1.9902 B(6)=-1.4552 


The discriminant function will obviously be of a more complicated nature, 
judging from the appearance of Figure VI-3-3. The following function 
appears to be satisfactory: 

(Z)=(A + (Z-C)/TMAX+B + (Z-C)/SMAX)(A * (Z-C)/TMAX-B + (Z-C)/SMAX) 


where A and B assume values indicated for Figure VI-3-3, TMAX=7.3396, 
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‘Figure VI-3-2. CASE 3, SECOND STEP 
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Figure Vi-3-5. CASE 3, THIRD STEP 
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and SMAX=7.5366. For this problem, f(Z) >0 would result in Z being 
classified as an X: {(Z) < 0 naturally would result in Y classification. 

CASE 4. In the first data sets of this section the points were 
generated from multi-dimensional cubes. In case four a different orienta- 
| Twenty-four tuples are 


tion is used, this time in four dimensions. 


generated for each of the X and Y sets. The Y set generated so that a 


2 y) Zz Z 
typical point Y=(¥ 4 V9 V3 +¥4) satisfies 2 £ Yy oa Yo +- Y, + Dn Soe 


Similarly an X point satisfies ie + ws + ee + os =i, 


j 2 3 4 Thus the geometric 


"pyicture” is that of a sphere surrounded by a shell. Forthis application, 
two sets of figures are provided to emphasize the importance of the initial 
values assigned to the A and B vectors. Pigures VI-4-1 through VI-4-5 
display the results of making an unfortunate choice of initial values for 
AandB. Thirty iterations were carried out from this initial choice; 
however, no significant improvement on the situation as indicated by 


Figure VI-4-5 was made. The values of the A and B vectors were as follows: 


Figure VI-4-1 A(1l)= .5780 B(1)=- .5000 
A(2)=  .5780 B(2)=- .5000 
A(3)= .5780 B(3)= 1.0000 
A(4)= .0000 B(4)= 1.0000 
Figure VI-4-2 A(1l)= .6680 BY)=- .Scee 
A(2)=- .2623 B(2)=- .5000 
A(3)=- .0208 B(3)= 1.0000 
A(4)=  .0900 B(4)= 1.0000 
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Figure VI-4-3 


Figure VI-4-4 


Figure VI-4-5 
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Rather than attempt to find a satisfactory discriminant function for the 
situation depicted in Figure VI-4-5, the approach is to specify new A and 
In fact, Starting with A=(1,1,1,1) and 


B=(1,-1,1,-1), separation appeared within twenty iterations. 


B vectors and to start over. 
Figures 
VI-4-6 through VI-4-10 depict the situation after the first, fifth, tenth, 
fifteenth, and twentieth iterations respectively. The A and B vectors for 


the figures are: 


Figure VI-4-6 A(1)= 1.0000 B(1)= 1.0000 
A(2)= 1.0000 B(2)=-1.0000 
A(3)= 1.0000 B(3)= 1.0000 
A(4)= 1.0000 B(4)=-1.0000 
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Figure Vi-4=]. CASE 4, FIRST STEP 
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“Figure VI-4-2. CASE 4, SECOND STEP 
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Figure VI-4-3. CASE 4, THIRD STEP 
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‘Figure VI-4-4. CASE 4, FOURTH STEP 
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Figure VI-4-5. CASE 4, FIFTH STEP 
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Figure VI-4-7 


Figure VIJ-4-8 
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Figure VI-4-10 
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The situation in Figure VI-4-10 provides another opportunity for displaying 


the flexibility possible in choosing a discriminant function. 


in this case is circular with the equation: 


£(Z)= (Ae (Z-C)/TMAX-1/3)* + (B° (Z-C)/SMAX)“-1/4 


The function 


where A and B have the values specified for Figure VI-4-10, TMAX=3.3577, 


and SMAX=5.8823. 


Zis Classifiedasa Y. 
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Clearly if £(Z) < 0, Z is classified as an X; if i(Z) > 0, 
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‘Figure VI-4-6. CASE 4, SIXTH STEP 
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‘Figure VI-4-7. CASE 4, SEVENTH STEP 
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‘Figure VI-4-8. CASE 4, EIGHTH STEP 
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Figure VI-4-9. CASE 4, NINTH STEP 
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‘Figure VI-4-10. CASE 4, TENTH STEP 
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It should be noted that the separation found was a result of the 
particular samples used. In reality the Y elements must have formed a 
ring instead of a shell. Clearly there is no possible orientation for the 
A-B plane into which the sphere and shen could be projected so as to 
display separation. This is a case where the samples were in fact too 
small to allow a discriminant function to be derived which would be 
applicable to the parent populations. 

Cases five and six represent applications of the separation program 
to "real world” data sets. The complexities found in actual data are re- 
flected by the apparent reduction in the performance level of the algorithm. 

CASE $. The data set for this cluster analysis problem is the 
Jamaican "Fossil” data mentioned in the discussion of Chernoff's method 
in section two of this paper. The data set, in the form of eighty-two 
eight-tuples, was drawn from Wright & Switzer's paper "Numerical 
Classification Applied to Certain Jamaican Eocene Nummulitids" 6 

As indicated earlier in this paper, the data was Siromentee asa 
single set of points Y (points on the output are indicated only as 0's or 
8's), and iterations -— made based on apparent separation; the aim is 
to emphasize ante separation. Twelve iterations were made starting with 
A=(1,1,1,1,1,1,1,1) and B=(1,-1,1,-1,1,-1,1,-1). Figures VI-5-1 
through VI-5-6 were output on the first, second, third, fOULtiy we iianE 
and twelfth iterations respectively. Since no discriminant aan is 
generated ina cluster analysis, the values of the A and B vectors have 


no final importance. On the other hand, the precise identity of each 
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Figure VI-5-1. CASE $, FIRST STEP 
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Figure VI-5-2. CASE 5, SECOND STEP 
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Figure VI-5-3. CASE 5, THIRD STEP 
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Figure VI-5-4. CASE 5, FOURTH STEP 
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Figure: VI-5-5. CASE S&S, Bim STEP 
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Figure VI-5-6. CASE 5, SIXTH STEP 
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Y point in the output is now important; this information is deducible from 
the point coordinates which are also part of the program —, The 
author found that the graphic output provided indications generally of two 
clusters; one consisted of points one through thirty-eight and seventy- 
three through eighty-two, and the other, of points thirty-nine through 
seventy-two. These clusters are indicated i Pieper VI-95-5, which 
represents the situation after nine iterations. Comparing these results 
to those of Wright & Switzer who found three clusters, it appears that 
the present method failed to differentiate between their groups one and 
three. The clusters found by Wright & Switzer have been indicated in 
Figure VI-5-6 (which represents the situation after three more iterations), 
identifying them by the coordinates of their respective members rather 
than by operator ovservation. Inspection of the positions of these groupings 
indicates that the clustering scheme proposed by Wright & Switzer might 
eventually be attained or at least approached by the algorithm's appli- 
cation. An important point is that Wright & Switzer required ninety-six 
iterations to arrive at their solution; all results of the present method 
were found in less than thirteen iterations. 

CASE 6 The second "real world" problem to be approached with 
this algorithm is a thirteen dimensional pattern classification problem. 
Each thirteen-tuple represents a set of medical readings taken from 
coronary patients at Letterman Hospital in the case of the Y's, and taken 
from re receiving physical examinations at Fort Ord Hospital for 


the X's. The Letterman patients were known to have been suffering 
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from coronary disease. No such concrete statement can be made for the 
Fort Ord examinees. For the latter it was not absolutely known that they 
did not have heart disease. This separation along with the appearance of 
the data indicated that if separation existed, it would probably not be 
well defined. Certain of the features needs thirteen-tuple represented 
measurable quantities such as age or weight. Other features such as 
race, medical history, and smoking habits were classifiers rather than 
measurements : 

The application of the algorithm failed to achieve a clear cut 
separation between the two sets; however as the sequence of Figures 
VI-6-1 through VI-6-6 indicates, relatively high concentrations of Y's 
and X's were attained. The A and B vectors for Figure VI-6-6 are as 
follows: 

Figure VI-6-6 A(1)= .0000 B(1)=- .9273 
A(2)=37 .0000 B(2)=-12 .4091 
A(3)=-20.0000 B(3)= 7.0818 
A(4)=  .0000 B(4)=38.3364 
A(5)=-1.0000 B(S)= .6455 


A(6)=-2 .0000 B(6)=- .6909 


A(7)=-1.0000 B(7)= .1636 
A(8)=-2 .0000 B(8)=-1.4636 
A(9)= .0000 B(9)= .6636 
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Figure VI-6-1. CASE 6, FIRST STEP 
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Figure VI-6-2. CASE 6, SECOND STEP 
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Figure VI-6-3. CASE 6, THIRD STEP 


73 





HREM EME HM HHH KE HEE NL ERE REM KEE REE EK HEHE HEE 


* * 
¥& * 
. * 
x X # 
* + 
* 0 O * 
* * 
% & % 
# 0 0 0 O = 
x 0 0 0 * 
* ¥ 
* * 
* QO 0 % 
¥ x @) 0 * 
* 0 0 0 pO 0 + 
. rae J 8 X * 
# O 0 0 FO GeT00 X x * 
5 3 
* O Os x X * 
* ~ 0 OJ X 6) * 
* 80 Xx XR CO x * 
# 0 0 x X €£ xax X * 
# x Ox 1 xXx X x * 
* = . X # 
* 0 x + 
% x x x * 
* x 0 x * 
* ¥ 
. % 
* ® 
#0 % 
¥% xX . + 
+ £ 
* 
* on + 

% 


i ee ok oe ee oe a oe oe te ae oe ee oe ae ee ee 


Figure VI-6-4. CASE 6, FOURTH STEP 


74 








ist Getla@es Tet? (eee 





HERR EHERE MH HR H HHH HEHEHE MH HEHEHE ENR HEHEHE ERHRH HEHEHE HRHEKHRHERHRHHEMHEKHE EE 


* 


™* 


a 


DK Kk KK mK Ke KK KK KR KK KK ® 
© 
Oo 
O 
ee kk KR Kk K KEK KR KR EK 


0 


HR KEE HEHEHE HHH HEHE EEE EHH HHH HKG RH HK AHH M HH HHHHHHEHRH RH KHER EHE HH 


xk Ke Kk mR wk mk Ke wk OK kk Ok KR OK 
~< 
o) 
x 

seek eke ee REE Re Re RE eK KO 


Figure VI-6-5. CASE 6, FIFTH STEP 
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‘Figure VI-6-6. CASE 6, SIXTH STEP 
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A(10)= . 0000 B(10)= Ry Aey- Se) 
A(11)=- 1.0000 B(11)= .9818 
A(12)=-10.0000 B(12)= 69.0818 
A(13)= 3.0000 B(13)=- 8.4455 
Obviously no clear cut recommendation for a discriminant function is 
justified by Piguts VI-6-6; however, the X data can be seen to be concen- 
trated in the third quadrant of the figure. This information should be 
useful in making tentative discriminant judgements. A possible discrim- 
inant function would be: 
£(Z)=B * (Z-C)/SMAX + (A* (Z-C)/TMAX)/10 for A*(Z-C) < 0 
=B - (Z-C)/SMAX + 3(A*(Z-C)/TMAX)/4 for A-(Z-C) 20 
This function classifies thirty-seven out of fifty X points and fifty-three 
out of sixty Y's correctly. This result is not satisfactory as the results 
obtained in the first four cases; however it is probably more reasonable 
to use it than to conduct the great number of iterations likely to be required 


to find separation (if it exists) and a more suitable discriminant function. 


PT 





VII. CONCLUSIONS AND RECOMMENDATIONS 


Based on the results noted using the generated data sets, it can be 
concluded that the separation algorithm proposed by Professor Cover is 
satisfactorily implemented by the program written for this paper. Provided 
that there exists an orientation for the A-B plane in which the projections 
of the data points are separated, the algorithm furnishes an avenue for 
describing this orientation. In both real world applications such an 
orientation was approached, but not actually found. It is apparent that 
dimensionality per se is not a problem for the separation algorithm. The 
display for the abimeeactinenstenal hospital problem was no more complex 
than that for the generated set of four six-dimensional cubes. Dimension- 
ality can complicate implementation of the algorithm when the range of 
satisfactory A-B plane orientations is small relative to the sample space. 
A small range requires in general a greater number of iterations. Once 
separation is achieved, definition of the discriminant function is quite 
simple. Using the printed output of the program, the discriminant function 
can be drawn in and its equation found as a polynomial curve. 

An important feature of this separation algorithm is that it makes 
no distributional assumptions or requirements on the data. This is 
valuable indeed when working with ‘real world" =i USRRS . Additionally 
the chain of operations on a data point are simple and easy to follow. 
Once a discriminant function has been defined, it is easily applied to 


new data points. 
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As a final comment, the importance of the man-machine approach 
of this algorithm cannot be overestimated. Increased familiarity with the 
data seems to reduce the number of iterations required for the achievement 
of separation. As the same data set was used repeatedly, it was obvious 
that the operator got a "feel” for the movement observed. Also the man's 
importance is emphasized when it is realized that he makes all decisions 
and judgements during the application. It was found that the choice of 
X and Y as the identifiers for the two classes in the problem was unfor- 
tunate. The similarity in appearance of these two letters caused some 
confusion in the operator. It was to alleviate some of this confusion 
that the labelling scheme for the output used X's and O's instead of 
mos and Y's. | 

Recommendations for future work with the separation algorithm are 
aimed at allowing the operator to gain a better grasp of the situation he 
views on the screen of the AGT... First, the video display should be 
changed from the present X-Y scheme to some other less confusing scheme 
such as an X-O display. 

“Second, a feature enone be available in the program to indicate 
when apSeiiied densities of points are shown on the screen. For example, 
if when the density of X's ina part of the screen reaches a certain level, 
the video of these X's either goes out of focus or assumes a differentiable 


intensity. 
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Third, the application of the algorithm might be extended by making 
the projection onto a surface other than a hyperplane. This modification 
requires significant alteration to the work presented here. 

Fourth, it might be helpful to oe operator to have video in the 
shape of an arrow appear by the point to be moved, indicating direction 
of movement. Also since each member of the ate set has an identifying 
number, it might be helpful when choosing a point to have the number of 


that point appear on the screen. 
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APPENDIX II. PROGRAM OPERATING INSTRUCTIONS 


To ready the program for a set of data a total of eight cards must 
be prepared and inserted into the deck. For the purposes of this appendix, 
assume that there are GG Y points and HH X points. Furthermore, assume 
all points are of dimension II. Then the required cards are as listed 
below. 

Four dimension cards are placed immediately after the card which 
reads "DIMENSION IGDIR(3)"; this is the third card in the program. These 
cards should read: 

DIMENSION DAV(II), TOT(II) 

DIMENSION IMAG(6*HH+1) ,IFRAM(5*GG+1) ,A(IT) , B(II) 

DIMENSION X(HH, II), Y(GG,II) ,XX1(HH) ,XX2(HH), 
YY1(GG) , YY2(GG) 

DIMENSION JMAG(6*HH+1) JJERAM(5*GG+1) 

Next following card number fifteen of the program which reads 
»11=0.01", the cards to be entered are: 

M=HH 

N=GG 

K=I] 
Finally a format statement must be inserted into the program. The number 
of the statement is 2187; further guidance is not practical since the 


format statement is used in connection with the formatted input of the 


3i0 





data. As an example, this statement might read "2187 FORMAT(4F10.6). 
The statement must correspond to the format of the data cards. The data 
cards are placed behind the program, and the deck is ready to run. 
Instructions for operating the XDS9300 computer and the ADAGE AGT 
console are contained in the = ubieatiaavenbitled "ELECTRICAL ENGINEERING 
COMPUTER LABORATORY" by R. D. Delaura [2] . Familiarity with this 
document is a must before operating these computers. The following 
instructions are a guide for this particular program. 
a ae Energize XDS9300 
ae Depress "RESET” and "POWER" simultaneously, and 
then "SENSE 2” on the 9300 console. 
lon Turn the teletypewriter on. 
or Load the program deck into the card reader and depress 


"POWER" and "START" on the card reader. 


d. Depress "READY" on the line printer. 
Le Energize the AGT Graphics Console 
an Depress "ON" and "RESET" at the AGT cabinet. 
be Turn on the disk drive. 
Ck When the ready light on the disk cabinet appears, 


energize the "THIS IS IT" circuit breaker on the back 
of the AGT cabinet. 
ale On the AGT cabinet depress "HALT”, "RESET", "RUN", 


and "PULSE 1". 


SE 





oie 


e. The typewriter at the graphics console should type the 
date request: "MO/DA/YR=". Type this information in. 
If this request does not appear, the AGT must be 
"Bootstrapped”. eSenen for Bootstrapping is 
available in the laboratory. 

ae Type "RESET("MAD",pvv)! where the p and v numbers 
indicate the location of MAD onthe disk. PVVis 
normally either 101 or 104. When the carriage of the 
typewriter returns type "GATED!". 

on Place the function switch overlay shown in Figure A-2-1 
over the function switches and energize the lightpen. 
The AGT console is now ready. 

Compile and execute the program 

a. At the XDS9300 console depress "CLEAR" and "CLEAR 
FLAGS" simultaneously, and then "IDLE", "RESET", . 
"RUN", and "CARDS" The card reader should start 
readying cards, and the line printer should produce a 
listing. | 

b. When execution commences, the teletype should print 
the statement "TYPE IDEV=2* AND A CARRIAGE RETURN 
IF USING AGT 2". The input light will then come on. 
Follow the instructions if using AGT 2; if using AGT 1, 


Simply type an asterisk and a carriage return. The 


program then reads the data. 


OZ 





PROGRAM OVERLAY 


END EDIT 


‘COMPLETE 
CHOICE 





Figure A-2-1. FUNCTION SWITCH OVERLAY 
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At the AGT console the word "NAMELIST” anda 
blinking cursor will appear at the bottom of the screen. 
Type in the initial values of the A and B vectors. Any 
accepted vector notation is acceptable; the following 
when possible (space limitations may require separate 
entry of component values) is recommended: 

A=1,1,1,1 and carriage return 

B=1,1,1,1 and carriage return 

* and carriage return 
If any mistakes in inputting the A or B vectors, the 
line printer will output an error message when the 
carriage return is depressed. Until the asterisk is 
typed, errors can be corrected, without harm to the 
execution of the program. When the asterisk and 
carriage return is typed, the program starts computing 
projections and the graphics display of video X’s and 


Y's appears. 


Iterations and output 


a. 


To cause an iteration, the user pushes the switch 
"PICK X" or "PICK Y" as appropriate. Video appears 
at the bottom of the screen indicating that the program 
is in the graphics edit mode. The user depresses the 
button on the side of the lightpen and places the tip 


of the pen over the desired point; he then releases the 
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button. The point selection is indicated when a portion 
of the picked letter vanishes. When the desired point 
is so indicated the operator presses the function switches 
"COMEGETE CHOICE" and "END EDIT". Finally he 
causes the iteration to occur by depressing one of the 
SFieies: ei Lehi Ure or - DOMIN 

By pressing the "REGRESS" button, the user Aan negate 
the effects of the last iteration. 

If the operator desires to initiate a new series of 
iterations, he depresses the "NEW AGB" button. The 
word "NAMELIST" and the flashing cursor on the screen 
and the program is effectively at step 3c. 

Finally by depressing the switch labelled "PLOT", the 
user causes the line printer to output a six by six inch 
representation of the situation he views on the AGT 


- 


SCreen. 


a Securing the computer 


a. 


There is no feature in the program which causes 
execution to terminate. Instead when the operator is 
finished he secures the computer. First in securing 
the AGT, he depresses switch "PULSE 1" and types 
"HOME!". He depresses "HALT" and "RESET" at the 
AGT cabinet and stops the disk drive. He then flips 


the "THIS IS IT" circuit breaker to the off position. 


95 


Jens 


Cc. 


The card reader and the teletype are turned off. 
The 9300 is secured by depressing "IDLE", followed 


by "RESET" and "POWER ON" (simultaneously). 
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APPENDIX III. DATA LISTING 


This appendix provides a listing of the data sets used in the 
illustrative examples presented in section six of the paper. The data 
is formatted so that for each case, a row of numbers in the listing 
represents a single tuple for that problem. Thus, for example, the data 
set for case one is presented in three columns, while that for case six 
is in thirteen columns. As an aid for reference, the following index to 


this appendix is provided: 
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